Skip to content

Feat/open pulse ontology v2.0.0#28

Merged
caviri merged 127 commits into
developfrom
feat/open-pulse-ontology-v2.0.0
May 6, 2026
Merged

Feat/open pulse ontology v2.0.0#28
caviri merged 127 commits into
developfrom
feat/open-pulse-ontology-v2.0.0

Conversation

@caviri
Copy link
Copy Markdown
Member

@caviri caviri commented May 6, 2026

No description provided.

- Added new entries to .gitignore for development and internal files.
- Updated `pyproject.toml` to include `logfire` and modified dependency specifications.
- Enhanced `uv.lock` with new package versions and added `babel`, `backrefs`, and `ghp-import` packages.
caviri added 28 commits March 5, 2026 08:25
…text handling. Default `include_upstream_stage_outputs_in_prompt` to true in `PipelineOrchestrator`, ensuring downstream LLM agents receive serialized upstream context. Strengthen reconciliation logic to prioritize ROR-backed organizations, merging high-confidence duplicates and providing detailed diagnostics in `/v2/extract`. Update prompts to include acronym disambiguation guidance and context validation rules, improving overall entity resolution accuracy.
… pipeline. Implement `llm_dedup` and `llm_critic` stages for enhanced entity resolution, ensuring fail-open behavior and detailed diagnostics. Update organization identity search tool to query ROR and Infoscience together, improving coherence in identifier assignments. Enhance context handling in `PipelineOrchestrator` and update prompts to include new tools and guidance for better entity reconciliation.
…idation. Introduce `make_duckduckgo_search_tool` for external context retrieval and `hash_user_email_tool` for email anonymization. Implement `link_veracity` stage for automatic link validation in `/v2/extract`, ensuring comprehensive diagnostics and improved entity pruning. Update orchestrator to compile context summaries and streamline agent interactions, enhancing overall data integrity and user experience.
…roduce `include_context_summary` parameter in `/v2/extract` to optionally include compiled context summaries in responses. Enhance `llm_critic` stage with tools for external context verification and owner provenance checks, ensuring better entity relevance assessment. Update documentation and tests to reflect these enhancements, improving overall extraction accuracy and user experience.
…rrect endpoint. This change ensures proper connectivity for the model's API interactions.
…cements. Introduce `POST /v2/extract` for body-based extraction, maintaining existing `GET /v2/extract/{full_path}` functionality. Implement `V2ExtractRequest` model for request validation and update API to handle new endpoint. Enhance documentation and tests to reflect these changes, ensuring improved extraction capabilities and user experience.
…ce the `gimie` function to manage exceptions more effectively, including specific handling for HTTP and connection errors. Update repository analysis logging to provide clearer insights on GIMIE output and analysis success. Additionally, refactor cache management to ensure robust data persistence with improved error logging for cache operations.
… devcontainer configuration to include SSH support and modify post-create commands. Introduce a new script to set the VS Code user's password at container start, enhancing security and usability.
…configuration. Add instructions for local .env setup to enhance security and usability.
…vironment. Add .env.example for environment variable configuration and update devcontainer.json to utilize docker-compose for service management, enhancing container orchestration and usability.
…improve post-create command. Add UV_CACHE_DIR to avoid root-owned cache issues and ensure proper cache directory creation during setup.
…optional DNS configuration. Update documentation to clarify environment variable setup for improved container networking and usability.
…tup. Add DNS resolver instructions to .env.example and include .uv-cache/ in .gitignore to prevent cache files from being tracked. Adjust schema paths in AGENTS.md and scripts for consistency with new directory structure.
…ship assessment and linked entities enrichment. Add main agent for fetching repository information, EPFL assessment prompts, and linked entities enrichment tools. Implement organization enrichment module for enhanced metadata analysis. Establish logging and configuration validation for robust agent management.
… warnings from agents and modules. Clean up unused files related to legacy imports across various components, streamlining the codebase for improved maintainability.
…text for improved context loading. Remove unused graph-related imports and clean up deprecated code in API and related modules, enhancing maintainability and performance.
…mponents. Clean up unused imports and environment variable checks, streamlining the codebase for improved maintainability and performance.
…apture_provider_snapshots.py, generate_mock_data.py, and related testing files to streamline the codebase and enhance maintainability.
…umentation. Update .env.example with additional environment variables and descriptions for improved clarity. Modify .gitignore to include logs and cache directories. Introduce new async job handling for the extraction process in the API, along with companion endpoints for job status retrieval. Update AGENTS.md and API reference documentation to reflect new tools and functionalities.
…etection. Introduce a new pipeline stage for determining parent-child relationships among organizations using an LLM agent. Add prompts for LLM input and output formatting, ensuring compliance with hierarchy rules. Update API to integrate the new stage and handle warnings for rejected relationships.
… Introduce `_is_link_veracity_enabled` and `_resolve_max_concurrent_agents` functions to manage environment variable settings for link verification and concurrent processing limits. Update orchestration logic to utilize these new configurations, enhancing performance and flexibility in the extraction process.
…ndle empty schema:author arrays. This function salvages repository entities by assigning a fallback owner from the reconciled graph when reconciliation results in an empty author array, addressing a known bug in the validation process. Update imports and module exports accordingly.
…n. This addition enables the Qdrant vector database for enhanced data storage and retrieval capabilities within the development environment. Update the service configuration with appropriate ports, volumes, and restart policies.
…nd enhance .env.example with additional configuration options. Update documentation to reflect new environment variables for improved clarity and usability in the development environment.
…e, RenkuLab, and EPFL Graph indices. Update AGENTS.md and justfile with new indexer commands and documentation for improved clarity and usability. Modify mkdocs.yml to reflect new documentation structure and sections.
…tes and specific `/v2/*` endpoints. Update documentation to reflect new API security requirements, including the introduction of `API_TOKEN` for protected routes. Enhance `justfile` commands to include authorization headers for cache management and extraction tests. Add new `discover` and `hydrate` protocols for federated indexing, along with corresponding CLI commands and documentation.
… GitHub Enterprise variable and add new context summary scout mode configuration. Enhance AGENTS.md and getting-started.md with updated Open Pulse Ontology version and additional details on the extraction process. Modify index.md and rag-indices.md to reflect changes in data storage layout and improve clarity on indexer configurations.
…r transient errors and add GitHub rate limit probing. Update .env.example to include configuration for maximum retry attempts. Modify API health checks to report GitHub rate limit status. Introduce validation for organization handles against GitHub API to ensure accurate entity representation.
@caviri caviri merged commit b8250e2 into develop May 6, 2026
3 of 4 checks passed
@caviri caviri deleted the feat/open-pulse-ontology-v2.0.0 branch May 6, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant